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1 1 On/i  1 8n  talk  session  with  Christian  Perrier,  debconf4  at  POA. 


Agenda 

> Following  Christian's  talk, 

> How  to  make  your  packages  better  about 
internationalization  and  localization 

> Abbreviations 

* i18n  : Internationalization 
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18  characters 

> II  On  : Localization 

1 Ocharacters 


Imagine! 


There  are  TWO  Word 


Processors  here... 


A. 

< jkf)  s*(e)  mmw  tSAd)  ssco  v-iun 


B. 

-MUT  (H)  File  Edit  V jew  I rise  r t Fo  rrna  t Tcro  Is  (Hi  n d o u1  He  1 1 


Which  do  you  prefer? 


Hello,  hello,  Debian. 


Menu  is  localized,  Menu  is  NOT  localized, 

Content  is  NOT  localized  Content  is  localized 


Needs  to  understand 
somewhat... 

T o understand  a basic  knowledge  about 
il8n/110n 


Language 


Many  many  languages  exist  in  the  world 
Latin  (English,  French,  German, ...) 


Welcome 


BIDI  (Arabic,  Hebrew)  jLujJ  oA  U 

CJK  (Chinese,  Japanese,  Korean) 

More  (Thai,  Hindi, . . .)  gjjl y I ° ^ I e 


Character  set 


A = 0x41  (ASCII,  IS08859-1) 
€ = 0xa4  (IS08859-15) 


and  Encoding  _ 0xa4  0xa2  (EUC_JP) 

> Character  set  0x82  OxaO  (Shift  JIS) 


> set  of  acceptable  characters  for  each  languages 
(ASCII,  JISX0208,  ...) 

> Encoding  (Encoded  character  set) 

> map  character  set  and  ID  number  for  computer 

> Many  encodings... 

> ISO-8859-1  (Latin-1),  ISO-8859-15  (Euro), 
ISO-2022-JP,  EUC-JP,  ShiftJIS,  Big5, 
GB2312,  KOI8-R,  .... 


Mojibake 

► What's  “mojibake”?  "t" 

> Broken  screen  (we  can't  read  characters) 

> It's  far  from  English  developpers,  but  we 
(Japanese)  meet  very  often 

> Why? 

> Mismatch  encoding  or  mismatch  font 

> Screen  problem 

> Bad  toolkit  or  bad  design 


How  to  make  your 
package  more 
HOn  and  i18n? 


msgid  "Debian  installer  main 
menu"  (message  ID) 


gettext 


msgstr  "Debian -f  I — 

> >>  —a.—"  (Japanese) 


Message  catalog  database  ms§str  "Menu  principal  de 

° ° I'installateur  Debian"  (French) 


* Switch  messages  using  LANGmsgstr  -Menu  prlnclpal  d0 

environment  variable  instalador  Debian"  (Brazil 

Portuguese) 

> Relates  msgid  and  localized  message 


> many  packages  use  this  for  1 1 On  (such  as 
debconf  messages) 


> core  architecutre  of  debian-installer  1 1 On 

> Many  bindings 

> C,  shell,  Perl,  Python,  Ruby,  Java,  ... 


gettext 

> Misunderstood  implementation... 

> Don't  use  non  ASCII  characters  in  msgid 

> Use  s(n)printf  and  %s  in  msgid  for  dynamic 
variable 

> Does  the  msgid  really  need  to  make  11  On? 

> Gettext  isn’t  the  “Silver  bullet”... 

> Remember  Word  Processor  question 


Toolkit  library 


> 


> 


> 


> 


A/lCtjli.  Deb  i an. 


Use  il8n  ready  libraries 
GTK+  (2.0  is  better) 

Qt 


Don’t  set  specific  font  as  default 
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> XLFD:  fixed  (?) 

> FreeType:  serif,  sans-serif,  or  monospace 

> Configurable  is  better 


Toolkit  library 

> Input  method  problem 

> How  to  input  your  local  characters? 

> Application  on  terminal 

> Depends  on  terminal  software 

> Application  on  X Window  System 

> Only  a few  of  modem  Toolkit  can  handle 

> Immodule,  XIM 


Use  internal  encoding 


> Before  treating  a string,  unify  its  encoding 


> Use  character  unit  instead  of  byte  unit 


ft  = 0xa4  0xa2  (EUC-JP) 


> 32bit  Unicode 


ft  = 0x00003042  (UCS-4) 


GNU  libc  (iconv(3)) 


> Wide  character  (wchar_t) 

> C,  C++ 

> mbstocws(3) 


Use  internal  encoding 

> Respect  user's  locale  setting.  User's  locale 
(encoding)  can  be  gotten  by: 

> locale  charmap 

> (returns  such  as  ANSI_X3.4-1968,  EUC-JP) 

> Use  'LANG=C'  or  'LC_ALL=C'  if  you  want  to 
ignore  locale  setting 

> Applications  should  provide  choice  for  user: 

> Input/Output  file  encoding 


Conclusion 

> There  are  many  languages  in  the  world! 
Wrong  implementation  causes  “Mojibake”. 

> Modern  Toolkit  for  X application  is 
recommended. 

> Use  internal  encoding,  such  as  UCS-4  or 
Wide  character. 

> Let's  try  to  make  your  application  110n/il8n 
ready!  More  110n/il8n  makes  users  more 

happy. :-)  Happy  Hacking! 


